This is a simple example of ensemble clustering using Python and the scikit-learn library.
Ensemble clustering involves combining the results of multiple clustering algorithms or multiple runs of the same algorithm to improve the overall clustering performance. The goal is to enhance the robustness and reliability of the clustering solution by leveraging the diversity of multiple algorithms or runs.
Key concepts of ensemble clustering:
Python Source Code:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import adjusted_rand_score
# Generate synthetic data with three clusters
X, y = make_blobs(n_samples=300, centers=3, random_state=42)
# Define base clustering algorithms
kmeans = KMeans(n_clusters=3, random_state=42)
agg_clustering = AgglomerativeClustering(n_clusters=3)
# Create an ensemble of clustering algorithms
ensemble_clustering = VotingClassifier(estimators=[
('kmeans', kmeans),
('agg_clustering', agg_clustering)
], voting='hard')
# Fit the ensemble clustering model
ensemble_clustering.fit(X)
# Get the ensemble clustering labels
ensemble_labels = ensemble_clustering.predict(X)
# Evaluate the performance using Adjusted Rand Index (ARI)
ari_score = adjusted_rand_score(y, ensemble_labels)
print(f'Adjusted Rand Index (ARI) of Ensemble Clustering: {ari_score:.2f}')
# Plot the original and ensemble clustering results
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', edgecolors='k')
plt.title('Original Clustering')
plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=ensemble_labels, cmap='viridis', edgecolors='k')
plt.title('Ensemble Clustering')
plt.show()
Explanation: